Optimal Regularized Dual Averaging Methods for Stochastic Optimization

نویسندگان

  • Xi Chen
  • Qihang Lin
  • Javier Peña
چکیده

This paper considers a wide spectrum of regularized stochastic optimization problems where both the loss function and regularizer can be non-smooth. We develop a novel algorithm based on the regularized dual averaging (RDA) method, that can simultaneously achieve the optimal convergence rates for both convex and strongly convex loss. In particular, for strongly convex loss, it achieves the optimal rate ofO( 1 N + 1 N2 ) forN iterations, which improves the rateO( logN N ) for previous regularized dual averaging algorithms. In addition, our method constructs the final solution directly from the proximal mapping instead of averaging of all previous iterates. For widely used sparsity-inducing regularizers (e.g., l1-norm), it has the advantage of encouraging sparser solutions. We further develop a multistage extension using the proposed algorithm as a subroutine, which achieves the uniformly-optimal rate O( 1 N + exp{−N}) for strongly convex loss.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Randomized Block Subgradient Methods for Convex Nonsmooth and Stochastic Optimization

Block coordinate descent methods and stochastic subgradient methods have been extensively studied in optimization and machine learning. By combining randomized block sampling with stochastic subgradient methods based on dual averaging ([22, 36]), we present stochastic block dual averaging (SBDA)—a novel class of block subgradient methods for convex nonsmooth and stochastic optimization. SBDA re...

متن کامل

Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as !1-norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting...

متن کامل

Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems

We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning. We propose two new stochastic gradient methods that are based on stochastic dual averaging method with variance reduction. Our methods generate a sparser solution than the existing methods because we do not need to take the average of the history o...

متن کامل

Iterative parameter mixing for distributed large-margin training of structured predictors for natural language processing

The development of distributed training strategies for statistical prediction functions is important for applications of machine learning, generally, and the development of distributed structured prediction training strategies is important for natural language processing (NLP), in particular. With ever-growing data sets this is, first, because, it is easier to increase computational capacity by...

متن کامل

Reweighted l1 Dual Averaging Approach for Sparse Stochastic Learning

Recent advances in stochastic optimization and regularized dual averaging approaches revealed a substantial interest for a simple and scalable stochastic method which is tailored to some more specific needs. Among the latest one can find sparse signal recovery and l0-based sparsity inducing approaches. These methods in particular can force many components of the solution shrink to zero thus cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012